Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model
نویسندگان
چکیده
Phoneme-based multilingual training and different crosslingual adaptation techniques for Automatic Speech Recognition (ASR) are explored in Connectionist Temporal Classification (CTC)-based systems. The multilingual model is trained to model a universal IPA-based phone set using CTC loss function. While the same IPA symbol may not correspond to acoustic similarity, Learning Hidden Unit Contribution (LHUC) is investigated. Given the multilingual model, different approaches are exploited and compared to adapt the multilingual model to a target language with limited adaptation data. In addition, dropout during cross-lingual adaptation is also studied and tested in order to mitigate the overfitting problem. Experiments show that the performance of the universal phoneme-based CTC system can be improve by apply LHUC and it is extensible to new phonemes during cross-lingual adaptation. Updating all the parameters shows consistently improvement on limited data. Applying dropout during adaptation can further improve the system and achieve competitive performance with Deep Neural Network (DNN)/ Hidden Markov Model (HMM) systems even on 21 hours data.
منابع مشابه
Pronunciation and Acoustic Model Adaptation for Improving Multilingual Speech Recognition
In this paper, we address the importance of pronunciation and acoustic model adaptation in multilingual speech recognition. When aiming at modeling several languages simultaneously, the degree of speaker and language variability is even greater than when concentrating on only one language. To compensate the pronunciation variability across various speaker, bi-lingual pronunciation modeling is p...
متن کامل2016 BUT Babel System: Multilingual BLSTM Acoustic Model with i-Vector Based Adaptation
The paper provides an analysis of BUT automatic speech recognition systems (ASR) built for the 2016 IARPA Babel evaluation. The IARPA Babel program concentrates on building ASR system for many low resource languages, where only a limited amount of transcribed speech is available for each language. In such scenario, we found essential to train the ASR systems in a multilingual fashion. In this w...
متن کاملMultilingual phone clustering for recognition of spontaneous indonesian speech utilising pronunciation modelling techniques
In this paper, a multilingual acoustic model set derived from English, Hindi, and Spanish is utilised to recognise speech in Indonesian. In order to achieve this task we incorporate a two tiered approach to perform the cross-lingual porting of the multilingual models to a new language. In the first stage, we use an entropy based decision tree to merge similar phones from different languages int...
متن کاملAn unified and automatic approach of Mandarin HTS system
Most studies on Mandarin HTS (HMM-based text-to-speech system) have taken the initial/final as the basic acoustic units. It is, however, challenging to develop a multilingual HTS in a uniformed and consistent way since most of other languages use the phoneme as the basic phonetic unit. It becomes hard to apply cross-lingual adaptation which need map phonemes from each other, particularly in the...
متن کاملA Unified and Automatic Approach Of Mandarin HTS System
Most studies on Mandarin HTS (HMM-based text-to-speech system) have taken the initial/final as the basic acoustic units. It is, however, challenging to develop a multilingual HTS in a uniformed and consistent way since most of other languages use the phoneme as the basic phonetic unit. It becomes hard to apply cross-lingual adaptation which need map phonemes from each other, particularly in the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1711.10025 شماره
صفحات -
تاریخ انتشار 2017